Skip to content

BUG: inconsistent concat casting EA vs non-EA #38843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 1, 2021

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Because we dont have 2D EAs, we end up passing axis=0 to concat_compat from internals.concat. As a result we drop empty arrays only when EAs are present. This leads to dtypes being preserved in EA cases while being cast to object in non-EA cases.

One solution is just to support 2D EAs. But since I like the dropping-empties behavior better anyway, this is good too.

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

can you add a whatsnew note

@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 31, 2020
@jreback jreback added this to the 1.3 milestone Dec 31, 2020
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 31, 2020

BTW, originally this is not directly related to having 2D EAs or not. The inconsistency already existed between DataFrame vs Series (because when concatting Series/SingleBlockManager, axis=0 is used for any dtype). So when concatting Series, empties are ignored, and when concatting DataFrames, empties are not ignored.
Only, now with extension dtypes (and maybe before with non-consolidatable block as well), also for DataFrame columns involving EAs, empties are dropped, creating an additional inconsistency within concatting DataFrames.

But so we should also tackle the general inconsistency between DataFrame and Series (which this PR might be doing already, but that should probably be made more explicit).

For the actual behaviour we want, it's also an option to go for the DataFrame behaviour of not dropping empties.
I think there are also good arguments for this behaviour: the resulting dtype of a concat-operation only depends on the input dtypes, and not on the exact content (the exact values, how many values (shape)). In general we want to get rid of value-dependent behaviour.

@jbrockmendel
Copy link
Member Author

this is not directly related to having 2D EAs or not

Yes it is. We pass axis=0 when we have EAs as a kludge bc they aren't 2D.

For the actual behaviour we want, it's also an option to for the DataFrame behaviour of not dropping empties [...] In general we want to get rid of value-dependent behaviour.

That's not unreasonable.

@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

hmm I though we did drop empties on DataFrame, but i guess not. yeah would be +1 on making that change for consistency.

@jreback jreback merged commit 2362df9 into pandas-dev:master Jan 1, 2021
@jbrockmendel jbrockmendel deleted the bug-nonempties branch January 2, 2021 01:36
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 4, 2021

I think this was merged a bit too quickly. As noted above (#38843 (comment)), IMO there is much to say for solving it the other way around: changing Series behaviour to match the DataFrame behaviour (of not dropping empties).


this is not directly related to having 2D EAs or not

Yes it is. We pass axis=0 when we have EAs as a kludge bc they aren't 2D.

I only wanted to mention that, even without EAs at all, we have an inconsistency in behaviour caused by the axis=0 check. And this inconsistency already existed before EAs, and it's something that needs to be solved regardless. But yes, for EA dtypes, this causes an additional inconsistency, as I acknowledged.

BTW, as general advice, I would recommend to stop constantly calling 1D EAs a "kludge", it's not helping the discussion in any way.

@jbrockmendel
Copy link
Member Author

BTW, as general advice, I would recommend to stop constantly calling 1D EAs a "kludge", it's not helping the discussion in any way.

The 1D EAs themselves are not the kludge, it is the workarounds that become necessary as a result of 1D EAs. In this case passing axis=0 to concat_compat.

@jorisvandenbossche
Copy link
Member

(Understood, but in general comments, it's not always clear to the reader which of both you mean, so I keep my advice ;))

Anyway, more importantly, can I get some response on the actual issue that I raised (the API change for concat with empties for DataFrame vs Series) ?

@jbrockmendel
Copy link
Member Author

the API change for concat with empties for DataFrame vs Series

I'm open to it.

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jan 8, 2021
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Jan 8, 2021
jreback pushed a commit that referenced this pull request Jan 8, 2021
* Revert "BUG: casting on concat with empties (#38907)"

This reverts commit 04282c7.

* Revert "BUG: inconsistent concat casting EA vs non-EA (#38843)"

This reverts commit 2362df9.
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
)

* Revert "BUG: casting on concat with empties (pandas-dev#38907)"

This reverts commit 04282c7.

* Revert "BUG: inconsistent concat casting EA vs non-EA (pandas-dev#38843)"

This reverts commit 2362df9.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants